Base Plotting in R

  • R comes with some very powerful plotting capabilities
    • Provided in the base package graphics
    • Always loaded with every session
  • Examples are often extremely helpful
  • People used happily for decades
    • The release of ggplot2 changed everything
  • Let’s quickly explore base plotting before moving to the good stuff

Start a New R Script

  • Call the new script: IntroVisualisation.R
  • Load our favourite packages at the top of the script
library(palmerpenguins)
library(ggplot2)
library(dplyr)

Base Plotting In R

  • Simple plots are usually easy
    • Complex figures can get really messy
  • Using the cars dataset
    • speed (mph)
    • dist (ft) each car takes to stop
plot(cars)

Base Plotting In R

  • The first two columns were automatically placed on the x & y axis
  • We could set values for x & y manually
    • Switching back to the penguins here
  • Automatically decided to plot using points
## Plot calling individual columns from penguins using `$`
plot(x = penguins$bill_depth_mm, y = penguins$bill_length_mm)

Base Plotting In R

  • The function boxplot() can also create simple figures easily
  • For categorical variables (i.e. factors) we can use the formula notation
    • y ~ x \(\implies\) y depends on x
## Make a simple boxplot showing the weights by species
boxplot(body_mass_g ~ species, data = penguins)
  • The dependent variable will always appear on the y-axis
  • The predictor will always appear on the x-axis

Base Plotting In R

  • We can also use combinations of predictor variables
## Separate by species and sex
boxplot(body_mass_g ~ sex + species, data = penguins)

Base Plotting In R

  • Histograms can be produced on an individual column
    • The number of breaks can be set manually
  • The default is pretty useful here
    • Generally simple figures without complexity
hist(penguins$body_mass_g, breaks = 20, xlab = "Body Mass (g)")

Base Plotting In R

  • Large datasets can be quickly explored using pairs()
  • Shows all pairwise combinations of columns
    • Categorical columns can be less informative
pairs(penguins)

The Grammar of Graphics

  • ggplot2 has become the industry standard for visualisation (Wickham 2016)
  • Core & essential part of the tidyverse
  • Developed by Hadley Wickham as his PhD thesis
  • An implementation of The Grammar of Graphics (Wilkinson 2005)
    • Breaks visualisation into layers

The Grammar of Graphics

Taken from https://r.qcbs.ca/workshop03/book-en/grammar-of-graphics-gg-basics.html

The Grammar of Graphics

Everything is added in layers

  1. Data
    • Usually a data.frame (or tibble)
    • Can be piped in \(\implies\) modify on the fly
  1. Aesthetics
    • x & y co-ordinates
    • colour, fill, shape, size, linetype
    • grouping & transparency (alpha)
  1. Geometric Objects
    • points, lines, boxplot, histogram, bars etc
  1. Facets: Panels within plots
  1. Statistics: Computed summaries
  1. Coordinates
    • polar, map, cartesian etc
    • defaults to cartesian
  1. Themes: overall layout
    • default themes automatically applied

An Initial Example

  • Using the example dataset cars
  • Two columns:
    • speed (mph)
    • distance each car takes to stop
  • We can make a classic x vs y plot using points
  • The predictor (x) would be speed
  • The response (y) would be distance

An Initial Example

  • We may as well start by piping our data in
cars |>
  ggplot(aes(x = speed, y = dist))
  • We have defined the plotting aesthetics
    • x & y
    • Don’t need to name if passing in order
  • Axis limits match the data
  • No geometry has been specified \(\implies\) nothing was drawn

An Initial Example

  • To add points, we add geom_point() after calling ggplot()
  • Adding + after ggplot() says “But wait! There’s more…”
cars |>
  ggplot(aes(x = speed, y = dist)) + 
  geom_point() 
  • When developing ggplot2 neither pipe had been developed yet

An Initial Example

  • To add points, we add geom_point() after calling ggplot()
    • Adding + after ggplot() says “But wait! There’s more…”
cars |> # Layer 1: Data
  ggplot(aes(x = speed, y = dist)) + # Layer 2: Aesthetics
  geom_point() # Layer 3: Geometry
  • By default:
    • Layer 4: No facets
    • Layer 5: No summary statistics
    • Layer 6: Cartesian co-ordinate system
    • Layer 7: Crappy theme with grey background 🤮

An Initial Example

  • A simple summary statistic to add might be stat_smooth()
  • Automatically chooses the smoother
    • Usually a loess curve or regression line
    • The standard error region is shown by default
cars |> # Layer 1: Data
  ggplot(aes(x = speed, y = dist)) + # Layer 2: Aesthetics
  geom_point() + # Layer 3: Geometry
  stat_smooth() # Layer 5: Statistics

Visualising Our Penguins


What visualisations could we produce to inspect penguins?

Creating Our First Plot

## Compare the two bill measurements
penguins |> # Layer 1: Data
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) + # Layer 2: Aesthetics
  geom_point()  # Layer 3: Geometry

Creating Our First Plot

  • There seem to be groups. Are these based on species? \(\implies\) Add colour
## Compare the two bill measurements
penguins |> 
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point()

Creating Our First Plot

  • We can also add regression lines
    • We’ll add equations later
    • Try without the se = FALSE and see what happens
## Add regression lines as a new geom
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) # Layer 5: Statistics

Understanding Aesthetics

  • Setting the colour in the call to ggplot() \(\implies\) all layers will use this
  • If we shift colour = species to geom_point() \(\implies\) ???
## Only use colour for the points
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point(aes(colour = species)) +
  stat_smooth(method = "lm", se = FALSE)

Understanding Aesthetics

  • We could set this again if we choose
## Set colour for the points and regression lines separately
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point(aes(colour = species)) +
  stat_smooth(aes(colour = species), method = "lm", se = FALSE)
  • It’s clunky here, but can give fine control for complex plots

Using Facets

  • Alternatively, we could plot each species in it’s own panel (or facet)
  • Using ~ notation to say all facets depend on species
## Plot each species in a separate panel
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point(aes(colour = species)) +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~species) # Layer 4: Facets

Using Facets

  • We can allow x and y axes to scale separately for each panel
    • Not always a helpful strategy
## Plot each species in a separate panel, allowing axes to be scaled freely
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point(aes(colour = species)) +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~species, scales = "free") 

Scales

Setting Scales

  • By default, ggplot2 will detect the most appropriate scale
    • Has applied scale_x_continuous() and scale_y_continuous()
# Explicitly set the scales. This will appear identical
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_x_continuous() +
  scale_y_continuous()
  • Multiple presets are available:
    • scale_x_log10(), scale_x_sqrt(), scale_x_reverse()
    • Also available for y

Setting Scales

  • For aesthetics like colour, we often want to tailor these
    • Default is scale_colour_discrete() (Meh…)
  • Many defaults exist
    • scale_colour_brewer(), scale_colour_viridis_d()
## Check the default palette for scale_colour_brewer
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_brewer()
  • Default palettes can be good sometimes
    • To show options for scale_colour_brewer() \(\implies\) RColorBrewer::display.brewer.all()

Setting Scales

  • I often use Set1, but try a few others
  • scale_colour_viridis_d() will give a colourblind-friendly palette
    • Other palettes are provided by other packages
## Set the palette for scale_colour_brewer to be "Set1" or anything else
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_brewer(palette = "Set1")

Setting Scales

  • Standard 7-colour palette adapted for colourblindness is included in ggthemes (Wong 2011)
    • Many alternatives exist
    • This one is written by Americans \(\implies\) weird spelling of colourblind
library(ggthemes)
## Use the colourblind friendly palette provided by ggthemes
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()

What Else Can We Do?

  • What else might be informative?
  • Can we separate by island or sex?
    • sex will have missing values
    • Let’s set the shape of the points
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex)) + # Now we have a layer-specific aesthetic 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()

Modifying Points

  • We can change the size of these outside the aesthetic
    • Fixed values only \(\implies\) will not respond to change in data
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + # Change the point size
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()

Modifying Points

  • Points can be set manually using scale_shape_manual()
    • Also scale_colour_manual()
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) ## Manually choose the point shapes

Modifying Points

  • How did I know to choose those two values?
  • Why do numbers represent different shapes
  • Enter ?pch and scroll down a little
    • 21-25 have both a colour (outline) and fill capability

Finishing Our Figure

  • The next step in making our figure look brilliant
    • Axis & Scale labels
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) +
  labs(
    # Manually add labels
    x = "Bill Depth (mm)", y = "Bill Length (mm)", 
    colour = "Species", shape = "Sex"
  ) 

Finishing Our Figure

  • The final layer in the Grammar of Graphics is the Theme
  • Controls the overall appearance not controlled elsewhere
  • The code can get long so let’s save that figure as p
    • Then I can modify on a single slide
## Save the figure for exploring theme attributes
p <- penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) +
  labs(
    x = "Bill Depth (mm)", y = "Bill Length (mm)", 
    colour = "Species", shape = "Sex"
  ) 

Themes

Using Themes

  • A default theme is applied: theme_grey()
  • I prefer theme_bw()
    • Removes the grey background
    • Gets most of the job done
p + theme_bw()

Using Themes

  • Additional modifications:
    • Setting the base font size for all annotations
    • Can also set colour, alignment, font face, font family etc
    • Vital for publishing figures
p + theme_bw() +
  theme(text = element_text(size = 14, ))

Using Themes

  • Any individual text element of a figure can be modified using element_text()
p + theme_bw() +
  theme(
    ## A slightly exaggerated modification of axis titles
    axis.title = element_text(colour = "darkred", size = 16, face = "bold")
  )

Using Themes

  • A key application is the placement of legends
p + theme_bw() +
  theme(legend.position = "bottom")

Using Themes

  • Legend can be placed inside using three steps
    • Set legend.position = "inside"
    • Set the position you want the legend
    • Set which part of the legend aligns at those co-ordinates
  • Can be extremely finicky
p + theme_bw() +
  theme(
    legend.position = "inside", # Ensure the legend is inside the plotting region
    legend.position.inside = c(0, 0), # Anchor to the bottom left
    legend.justification.inside = c(0, 0) # Set the alignment to be bottom left
  )

Different Plot Types

Different Plot Types

Classic BarPlots

  • geom_bar() & geom_col()
  • geom_errorbar() & geom_errorbarh()

Classic Density plots

  • geom_boxplot() & geom_violin()
  • geom_density() & geom_histogram()

Line-based Geometry

  • geom_line(), geom_segment()
  • geom_abline(), geom_hline() & geom_vline()

Heatmaps and Grids

  • geom_raster(), geom_tile() & geom_rect()

Creating A Boxplot

  • A starting point might be to choose sex as the predictor
  • body_mass_g may be a response variable
penguins |> 
  ggplot(aes(island, body_mass_g)) +
  geom_boxplot()

Creating Our Boxplot

  • To incorporate the sex \(\implies\) add a fill aesthetic
    • colour is generally applied to shape outlines
## Fill the boxes by sex
penguins |> 
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot()
  • ggplot2 will always separate multiple values/category

Creating Our Boxplot

## Remove the penguins with no recorded sex
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot()

Creating Our Boxplot

  • We could also separate by island using facet_wrap()
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot() +
  facet_wrap(~species, scales = "free_x")

Creating Our Boxplot

  • A less-intuitive alternative (facet_grid()) will allow for unequal-sized facets
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot() +
  facet_grid(~species, scales = "free_x", space = "free_x")
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(sex, body_mass_g, fill = sex)) +
  geom_boxplot(outliers = FALSE) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  facet_wrap(~species + island, nrow = 1)

Trying a Violin Plot

penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(sex, body_mass_g, fill = sex)) +
  geom_violin(draw_quantiles = 0.5, trim = FALSE) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  facet_wrap(~species + island, nrow = 1)

Creating A Histogram

penguins |>
  filter(!is.na(sex)) |>
  ggplot(aes(body_mass_g)) +
  geom_histogram(fill = "grey70", colour = "black") +
  facet_grid(species ~ sex) +
  theme_bw()

Creating a Summary Barplot

penguins |>
  filter(!is.na(sex)) |>
  summarise(
    weight_mn = mean(body_mass_g, na.rm = TRUE),
    weight_sd = sd(body_mass_g, na.rm = TRUE),
    .by = c(species, sex)
  ) |>
  ggplot(aes(sex, weight_mn, fill = sex)) +
  geom_col() +
  geom_errorbar(
    aes(ymin = weight_mn - weight_sd, ymax = weight_mn + weight_sd),
    width = 0.2
  ) +
  facet_wrap(~species, nrow = 1) +
  scale_y_continuous(expand = expansion(c(0, 0.05))) +
  scale_fill_brewer(palette = "Set1") +
  theme_bw()

Saving Images

  • The simple way is click Export in the Plots pane
  • The way to save using code is
ggsave("myplot.png", width = 7, height = 7, units = "in")
  • This will always save the most recent plot by default
  • Output format is determined by the suffix
  • Try saving as a pdf…

Saving Images

  • I think saving using code is preferable
  • Modify an analysis or data \(\implies\) saved figures will also update
    • This saves time & ensures reproducibility

Conclusion

A fabulous resource: https://r-graphics.org/

References

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilkinson, Leland. 2005. The Grammar of Graphics. Springer New York, NY. https://doi.org/https://doi.org/10.1007/0-387-28695-0.
Wong, Bang. 2011. “Color Blindness.” Nat. Methods 8 (6): 441.